Modeling the Creaky Excitation for Parametric Speech Synthesis
نویسندگان
چکیده
In order to produce natural sounding output, corpus-based speech synthesis systems need to be able to properly model the acoustic variability in the corpus. Creaky voice is a voice quality frequently produced in many languages, in both read and conversational speech settings. However, the creaky excitation displays different acoustic characteristics than modal excitations and is, hence, not suitably modelled by standard vocoders. This study presents an analysis of the creaky excitation which is used to derive an extension of the Deterministic plus Stochastic Model of the residual signal. This proposed model is designed to appropriately model creaky voice and is integrated into a vocoder for parametric speech synthesis. Copy-synthesis versions of short speech segments containing creaky voice were used in a subjective listening test which revealed clearly better rendering of the voice quality than a standard vocoder.
منابع مشابه
Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis
In statistical parametric speech synthesis, creaky voice can cause disturbing artifacts. The reason is that standard pitch tracking algorithms tend to erroneously measure F0 in regions of creaky voice. This pattern is learned during training of hidden Markov-models (HMMs). In the synthesis phase, false voiced / unvoiced decision caused by creaky voice results in audible quality degradation. In ...
متن کاملHMM-based synthesis of creaky voice
Creaky voice, also referred to as vocal fry, is a voice quality frequently produced in many languages, in both read and conversational speech. To enhance the naturalness of speech synthesis, these latter should be able to generate speech in all its expressive diversity, including creaky voice. The present study looks to exploit our recent developments, including creaky voice detection, predicti...
متن کاملStatistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters
This paper describes a novel framework for statistical parametric speech synthesis in which statistical modeling of the speech waveform is performed through the joint estimation of acoustic and excitation model parameters. The proposed method combines extraction of spectral parameters, considered as hidden variables, and excitation signal modeling in a fashion similar to factor analyzed traject...
متن کاملPhase Modeling Using Integrated Linear Prediction Residual for Statistical Parametric Speech Synthesis
The conventional statistical parametric speech synthesis (SPSS) focus on characteristics of the magnitude spectrum of speech for speech synthesis by ignoring phase characteristics of speech. In this work, the role of phase information to improve the naturalness of synthetic speech is explored. The phase characteristics of excitation signal are estimated from the integrated linear prediction res...
متن کاملAutomatic detection of creaky voice using epoch parameters
This paper proposes a method based on epoch parameters for detection of creaky voice in speech signal. The epoch parameters characterizing the source of excitation considered in this work are number of epochs in a frame, strength of excitation of epochs and epoch intervals. Analysis of epoch parameters estimated from zero-frequency filtering method with different window sizes is carried out. Di...
متن کامل